MCODE: Multivariate Conditional Outlier Detection

نویسندگان

  • Charmgil Hong
  • Milos Hauskrecht
چکیده

Outlier detection aims to identify unusual data instances that deviate from expected patterns. The outlier detection is particularly challenging when outliers are context dependent and when they are defined by unusual combinations of multiple outcome variable values. In this paper, we develop and study a new conditional outlier detection approach for multivariate outcome spaces that works by (1) transforming the conditional detection to the outlier detection problem in a new (unconditional) space and (2) defining outlier scores by analyzing the data in the new space. Our approach relies on the classifier chain decomposition of the multidimensional classification problem that lets us transform the output space into a probability vector, one probability for each dimension of the output space. Outlier scores applied to these transformed vectors are then used to detect the outliers. Experiments on multiple multi-dimensional classification problems with the different outlier injection rates show that our methodology is robust and able to successfully identify outliers when outliers are either sparse (manifested in one or very few dimensions) or dense (affecting multiple dimensions).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multivariate Conditional Outlier Detection and Its Clinical Application

This paper overviews and discusses our recent work on a multivariate conditional outlier detection framework for clinical applications.

متن کامل

Detecting Unusual Input-Output Associations in Multivariate Conditional Data

Despite tremendous progress in outlier detection research in recent years, the majority of existing methods are designed only to detect unconditional outliers that correspond to unusual data patterns expressed in the joint space of all data attributes. Such methods are not applicable when we seek to detect conditional outliers that reflect unusual responses associated with a given context or co...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

Detection of Abnormal Input-Output Associations

We study a novel outlier detection problem that aims to identify abnormal input-output associations in data, whose instances consist of multi-dimensional input (context) and output (responses) pairs. We present our approach that works by analyzing data in the conditional (input–output) relation space, captured by a decomposable probabilistic model. Experimental results demonstrate the ability o...

متن کامل

Outlier Detection Methods in Multivariate Regression Models

Outlier detection statistics based on two models, the case-deletion model and the mean-shift model, are developed in the context of a multivariate linear regression model. These are generalizations of the univariate Cook’s distance and other diagnostic statistics. Approximate distributions of the proposed statistics are also obtained to get suitable cutoff points for significance tests. In addi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1505.04097  شماره 

صفحات  -

تاریخ انتشار 2014